Merge music branch to develop. #56

vlachvojta · 2023-11-03T17:02:29Z

Summary of changes in pull request #56 music->develop

YOLO + Non-text regions
Export + refactor of AltoXML and PageXML
Music (optical music recognition)
Changes in .ini config for parse_folder.py

YOLO + Non-text regions

Added LayoutExtractorYolo as a layout parser capable of using YOLO model to detect "non-text" regions (e.g. images, tables, etc.)
Added category attribute to RegionLayout and TextLine storing YOLO output classes or text for original layout parsers)
To use LayoutExtractorYolo and YOLO inference, you have to install ultralytics library manually (as it creates library version problems when imported with other libraries)

See code in: pero_ocr/document_ocr/page_parser.py/LayoutExtractorYolo

Export + refactor of AltoXML and PageXML

Added Enum for ALTO versions in (different baseline format)
- in v2.x baseline is only float (mean of Y baseline coords)
- in v4.4 baseline is a string in x1,y1 x2,y2 ... format (same as PageXML)
moved exporting and importing functions to specific classes (TextRegion + TextLine) for better readability and maintainability.

See code in: pero_ocr/core/layout.py/*.(to|from)_(page|alto)xml()

Music (optical music recognition)

Independent scripts to work with transcription of music staff from PageLayout. In music folder.
Simple CLI for music exporting from PageLayout to MusicXML and/or MIDI in user_scripts/export_music.py

Changes in .ini config for `parse_folder.py`

Section names

Allow user to define more than one Layout parser, Line Cropper and OCR sections with new format:

LAYOUT_PARSER_\d+ or LAYOUT_PARSER (updated)
LINE_CROPPER_\d+ or LINE_CROPPER
OCR_\d+ or OCR

See code in: pero_ocr/document_ocr/page_parser.py/PageParser.init_config_sections()`

Execution order example
Sections are executed in alphabetical order (first executing all Layout parsers and then pairs of Line Cropper with corresponding OCR).

LAYOUT_PARSER_1 LAYOUT_PARSER_2 LAYOUT_PARSER_3 LINE_CROPPER_1 LINE_CROPPER_2 OCR_1 OCR_2	sort ==>	LAYOUT_PARSER_1 LAYOUT_PARSER_2 LAYOUT_PARSER_3 LINE_CROPPER_1 OCR_1 LINE_CROPPER_2 OCR_2

See code in: pero_ocr/document_ocr/page_parser.py/PageParser.process_page()`

New attributes

LINE_CATEGORIES (list, [] by default) attribute for LAYOUT_PARSER:

For these categories (+ 'text' by default or [] for all)
After creating RegionLayout object (by LayoutExtractorYolo) of some category, create also TextLine objects with the same category (otherwise leave empty RegionLayout with no TextLine objects).
Used for creating TextLine objects for music regions but doesn't make sense for non-text regions (e.g. images) as they don't have transcriptions.

See code in: pero_ocr/document_ocr/page_parser.py/LayoutExtractorYolo.process_page()

CATEGORIES (list, [] by default) attribute for LINE_CROPPER and OCR sections:

Apply LINE_CROPPER and OCR engines only on TextLine objects with these categories (or [] for all)
Filtered and then merged in every process_page call using split_page_layout_by_categories and merge_page_layouts

See code in: pero_ocr/layout_engines/layout_helpers.py/split_page_layout_by_categories()

SUBSTITUTE_OUTPUT (bool, yes by default) attribute for OCR section:

If yes, substitute output of OCR engine using output_substitution_table in OCR_JSON (OCR engine configuration) using dictionary substitution key->value.
symbols are the line split by whitespaces

See code in: pero_ocr/document_ocr/page_parser.py/PageOCR.substitute_transcriptions()

SUBSTITUTE_OUTPUT_ATOMIC (bool, no by default) attribute for OCR section:

If yes, translation is done in atomic way on a page level: either all lines are translated or none.
If no, translation is done in best-effort way: lines are translated independently and if some line fails, it is left untranslated.

See code in: pero_ocr/document_ocr/page_parser.py/PageOCR.substitute_transcriptions()

UPDATE_TRANSCRIPTION_BY_CONFIDENCE (bool, no by default) attribute for OCR section:

If yes, update line transcription only if the new transcription has higher confidence.
If no, update transcription always.
Can be used for transcribing lines with multiple OCR engines to get the best result.

See code in: pero_ocr/document_ocr/page_parser.py/PageOCR.process_page()

... see commit messages for more ...

…les.

…ed labels (model output) to more verbose format usable by `export_music.py`.

… to PageLayout.

…eXML or ALTO.

…ALTO.

… loading.

… + loading.

…to define settings and `page_parser.py` to create music exporter object of `music/export_music/ExportMusicPage`.

…g box around polygon.

…t.py`. Get names only from Yolo `result.names`.

…f lines.

…music.py` a stand-alone script.

… text Layout engine to work only with 'text' lines.

…h with its own setting and set of categories to work with.

# Conflicts: # pero_ocr/core/layout.py

…tructure.

…ltoxml. part 2.

Parameter sets if PageOCR should update to new line: - every time (false) - only if better confidence (true) Applies in case of rerunning OCR on previously transcribed line)

vlachvojta · 2024-06-14T12:05:01Z

TODOs from MartinK.

ALTO export baseline (all points, not just mean) using PointsType
- ALTO version <4.2: original mean
- ALTO version >=4.2: new PointsType
eval_ocr_pipeline_xml.py
- what is wrong_order in eval_ocr_pipeline_xml.py - wrong order of lines IN REGION (are mapped lines' successors in region.lines also mapped?) (mapping is done by levhenstein distance here)
- test "wrong order numbers" with naive sorter - very similar
- confidence distance stats
  - kód hotov, statisticky jsou rozdíly stejné jako znovu projití stejným kódem

Results from eval_ocr_pipeline_xml.py:

develop vs develop second run

develop vs music branch

{
  "gt_sum_char": 435306,
  "gt_lines_count": 15084,
  "good_lines": 12976,
  "unmapped_gt_lines": 0,
  "unmapped_gt_chars": 0,
  "unmapped_input_lines": 1,
  "unmapped_input_chars": 0,
  "mapped_char_errors": 32,
  "wrong_order": 2108,
  "wrong_line_transcriptions": 0,
  "good_order": 12976,
  "non_zero_confidence_distances_len": 32,
  "total_files": 99,
  "non_zero_confidence_distance_files": 8,
  "confidence_distances": {
    "max": 0.3460000000000001,
    "min": 0,
    "mean": 0.0002338933415536375,
    "median": 0,
    "std": 0.007068152540888958,
    "non_zero_count": 32,
    "total": 12976
  }
}

{
  "gt_sum_char": 435306,
  "gt_lines_count": 15084,
  "good_lines": 13070,
  "unmapped_gt_lines": 0,
  "unmapped_gt_chars": 0,
  "unmapped_input_lines": 0,
  "unmapped_input_chars": 0,
  "mapped_char_errors": 3,
  "wrong_order": 2014,
  "wrong_line_transcriptions": 0,
  "good_order": 13070,
  "non_zero_confidence_distances_len": 9,
  "total_files": 99,
  "non_zero_confidence_distance_files": 5,
  "confidence_distances": {
    "max": 0.33499999999999996,
    "min": 0,
    "mean": 5.9066564651874526e-05,
    "median": 0,
    "std": 0.004118486144981323,
    "non_zero_count": 9,
    "total": 13070
  }
}

- Versions older than 4.2 defines baseline as a simple float. (that's where the original baseline comes from) - version 4.2 and never defines baseline as a PointsType string with recommend format: "x1,y1 x2,y2 ..."

…port options. - Versions older than 4.2 defines baseline as a simple float. (baseline is exported as mean of all Y baseline points) - version 4.2 and never defines baseline as a PointsType string with recommend format: "x1,y1 x2,y2 ..."

…on3.9)

…n Python3.9)

Old XMLs on input don't have category => line.category = None, OCR (and others) have to be set to `[]` by default to process ALL PAGES.

1) Remove `ultralytics` and `music21` from dependencies for the whole projest. the user will have to install them when needed. 2) Import `ultralytics` only when needed, so it doesn't create import error for specific numpy versions. Ultralytics has this dependency right now: "numpy>=1.23.5,<2.0.0". See current at [github.com/ultralytics/ultralytics/blob/main/pyproject.toml](https://github.com/ultralytics/ultralytics/blob/69cfc8aa228dbf1267975f82fcae9a24665f23b9/pyproject.toml#L67)

…t away.

ikiss-fit · 2024-07-23T13:06:45Z

@vlachvojta I have found few bugs that shold be resolved:

The condition in layout_engines/smart_sorter.py:286 is wrong: It causes problems when the PageLayout doesn't contain any region of the category the sorter is sorting - it returns empty PageLayout. The condition should be "if the splitted PageLayout contains at least one region then the sorting happens".
The lengths variable in music/music_structures.py:485 should probably be np.array since it is then used that way (line 499)
MusicExporter is probably missing a condition when adding line to the overall music because in some cases I get following exception:

  File "/home/ikiss/projects/pero/pero-demo/ocr_pipeline.py", line 69, in process_file
    self.music_exporter.process_page(page_layout)
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 59, in process_page
    self.export_page_layout(page_layout, page_layout.id)
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 64, in export_page_layout
    parts = self.regions_to_parts(
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 120, in regions_to_parts
    part.add_textline(line)
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 230, in add_textline
    new_measures_encoded = encode_measures(new_measures, len(self.measures) + 1)
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 302, in encode_measures
    measures_encoded.append(measure.encode_to_music21())
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_structures.py", line 129, in encode_to_music21
    self.repr = self.encode_to_music21_polyphonic()
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_structures.py", line 174, in encode_to_music21_polyphonic
    voices_repr = [voice.encode_to_music21_monophonic() for voice in voices]
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_structures.py", line 174, in <listcomp>
    voices_repr = [voice.encode_to_music21_monophonic() for voice in voices]
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_structures.py", line 477, in encode_to_music21_monophonic
    self.repr.append(group.encode_to_music21_monophonic())
  File "/home/ikiss/venvs/pero-demo/lib/python3.10/site-packages/music21/stream/base.py", line 2637, in append
    self.coreGuardBeforeAddElement(e)
  File "/home/ikiss/venvs/pero-demo/lib/python3.10/site-packages/music21/stream/core.py", line 440, in coreGuardBeforeAddElement
    raise StreamException(
music21.exceptions21.StreamException: The object you tried to add to the Stream, None, is not a Music21Object.  Use an ElementWrapper object if this is what you intend.

In `smart_sorter.py`: - if less then to engines filtered, return original page_layout and not only the split one. In `music structures.py`: - change type of `lengths` to numpy array, fix min_length to take from numbers and not names. - ensure `encoded_group` is not None before appending it to the voice. full comment: [pero-ocr/pull/56/#issuecomment-2245202776](#56)

…whole region to positive or negative (ignore categories of lines inside the region)

Export multirest as a simple default 'whole' rest.

…on category, None = 'text')

…fidence -- in case when there are no logits (i.e. logits.shape[0] == 0) the confidence cannot be calculated.

vlachvojta added 20 commits September 1, 2023 15:20

Add scripts for exporting music from PageLayout to MIDI + MusicXML fi…

fc73bc9

…les.

Add translator dictionary, defining translation from internal shorten…

bda025a

…ed labels (model output) to more verbose format usable by `export_music.py`.

Add base for LayoutEngineYolo using ultralytics YOLO. With conversion…

fe9aceb

… to PageLayout.

Junk-code cleanup and docu.

2da0f24

Prepare attributes for music-text distinction. WITHOUT exports to Pag…

2f661d2

…eXML or ALTO.

Add support for region.music_region. WITHOUT exports to PageXML or …

5373283

…ALTO.

Add category attribute to region. WITH saving to pageXML custom tag +…

1b9b676

… loading.

Add category attribute to TextLine. WITH saving to pageXML custom tag…

7cc5293

… + loading.

Little refactoring.

fae66ce

Add exporting music directly in parse_folder.py using config.ini …

b02ebb2

…to define settings and `page_parser.py` to create music exporter object of `music/export_music/ExportMusicPage`.

Fix minor issue with non-existing function get_las_region_id

c07ba6a

Add sorting music regions "in reading order" using y_min of boundin…

0afb958

…g box around polygon.

Remove RegionCategory and LineCategory enums hard-coded in `layou…

9371467

…t.py`. Get names only from Yolo `result.names`.

Update MusicExporter to export music only from certain categories o…

b733c74

…f lines.

Remove music exporter option from parse_folder.py and make `export_…

b40e479

…music.py` a stand-alone script.

Add option to have more LineCroppers and ORC engines. Set every other…

7b93d16

… text Layout engine to work only with 'text' lines.

Disable throwing error if no crop for line. Continue and ignore line.

50418dd

Add PageLayout splitting enabling running multiple layout parsers eac…

8853046

…h with its own setting and set of categories to work with.

Remove unused functions.

5ce86d1

Merge branch 'develop' into music

16e8ce3

# Conflicts: # pero_ocr/core/layout.py

vlachvojta requested a review from ikiss-fit November 3, 2023 17:02

vlachvojta and others added 9 commits December 1, 2023 16:16

Add simple script to check if page layouts in two folders have same s…

3e2fcb8

…tructure.

Merge remote-tracking branch 'origin/develop' into music

a49409c

Disable double logging (stdout + stderr)

94bcae8

Refactor page xml export + import.

d688bc0

Refactor alto xml export.

65eacb5

Merge remote-tracking branch 'origin/develop' into music

bb88217

Minor updates

3354a5b

Unify most of method names: page_xml to pagexml and alto_xml to altoxml.

59dd330

Unify most of the method names: page_xml to pagexml and alto_xml to a…

3c3322e

…ltoxml. part 2.

Add config parameter UPDATE_TRANSCRIPTION_BY_CONFIDENCE

70a7e35

Parameter sets if PageOCR should update to new line: - every time (false) - only if better confidence (true) Applies in case of rerunning OCR on previously transcribed line)

vlachvojta added 16 commits June 17, 2024 14:35

Add ALTO baseline (export + import) in two options (float or points)

9dcd33f

- Versions older than 4.2 defines baseline as a simple float. (that's where the original baseline comes from) - version 4.2 and never defines baseline as a PointsType string with recommend format: "x1,y1 x2,y2 ..."

Save polygon points only as positive numbers. (XSD validation issue)

2141002

Remove prints.

34c6584

Allow run when at least one ORC engine provide_ctc_logits.

82b3e70

Fix README.md example + delete false info about setup.py.

2aed4bc

Update README.md - spelling correction.

4efdbab

Add typing Optional to allow lower versions of Python (tested on Pyth…

134f51a

…on3.9)

Add libraries needed to install in docker installation.

25f555c

Update texts for better UX.

26b0c1a

Make default version of ALTO to the older one.

3ce8bbc

Add typing List and Tuple to allow lower versions of Python (tested o…

3ec7902

…n Python3.9)

Fix page_xml "custom" field export to export category only if not None.

520e3ae

Set category filter fallback to [] for backward compatibility.

9b414a0

Old XMLs on input don't have category => line.category = None, OCR (and others) have to be set to `[]` by default to process ALL PAGES.

Add libraries back to pyproject.toml, so new machines install it righ…

f5a7a51

…t away.

vlachvojta and others added 9 commits August 5, 2024 13:39

Merge remote-tracking branch 'origin/develop' into music

62af812

Add regions to splitting by category. If region.category set, move …

b60196f

…whole region to positive or negative (ignore categories of lines inside the region)

Add better None check.

4d2ddaa

Disable exporting midi lines if no notes on the line.

7c4251e

Export multirest as a simple default 'whole' rest.

Merge branch 'develop' into music

d9c64cd

Simplify splitting page layouts to allow backwards (only look at regi…

fa1a897

…on category, None = 'text')

Add IndexError to catch expression when calculating transcription con…

f5f2f42

…fidence -- in case when there are no logits (i.e. logits.shape[0] == 0) the confidence cannot be calculated.

Update layout.py

747e491

ikiss-fit merged commit 02e3d7a into develop Nov 14, 2024

ikiss-fit deleted the music branch November 14, 2024 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge music branch to develop. #56

Merge music branch to develop. #56

vlachvojta commented Nov 3, 2023 •

edited

Loading

vlachvojta commented Jun 14, 2024 •

edited

Loading

ikiss-fit commented Jul 23, 2024

Merge music branch to develop. #56

Merge music branch to develop. #56

Conversation

vlachvojta commented Nov 3, 2023 • edited Loading

Summary of changes in pull request #56 music->develop

YOLO + Non-text regions

Export + refactor of AltoXML and PageXML

Music (optical music recognition)

Changes in .ini config for parse_folder.py

Section names

New attributes

vlachvojta commented Jun 14, 2024 • edited Loading

ikiss-fit commented Jul 23, 2024

vlachvojta commented Nov 3, 2023 •

edited

Loading

Changes in .ini config for `parse_folder.py`

vlachvojta commented Jun 14, 2024 •

edited

Loading